Linguistic Linked Open Data (LLOD) Introduction and Overview
نویسندگان
چکیده
The explosion of information technology has led to a substantial growth in quantity, diversity and complexity of linguistic data accessible over the internet. The lack of interoperability between linguistic and language resources represents a major challenge that needs to be addressed, in particular, if information from different sources is to be combined, like, say, machine-readable lexicons, corpus data and terminology repositories. For these types of resources, domainspecific standards have been proposed, yet, issues of interoperability between different types of resources persist, commonly accepted strategies to distribute, access and integrate their information have yet to be established, and technologies and infrastructures to address both aspects are still under development. The goal of the 2nd Workshop on Linked Data in Linguistics (LDL-2013) has been to bring together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections, including corpora, dictionaries, lexical networks, translation memories, thesauri, etc., infrastructures developed on that basis, their use of existing standards, and the publication and distribution policies that were adopted. Background: Integrating Information from Different Sources In recent years, the limited interoperability between linguistic resources has been recognized as a major obstacle for data use and re-use within and across discipline boundaries. After half a century of computational linguistics [8], quantitative typology [12], empirical, corpus-based study of language [10], and computational lexicography [16], researchers in computational linguistics, natural language processing (NLP) or information technology, as well as in Digital Humanities, are confronted with an immense wealth of linguistic resources, that are not only growing in number, but also in their heterogeneity. Interoperability involves two aspects [14]: Structural (‘syntactic’) interoperability: Resources use comparable formalisms to represent and to access data (formats, protocols, query languages, etc.),
منابع مشابه
Three Birds (in the LLOD Cloud) with One Stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy
In this paper we present the current status of linguistic resources published as linked data and linguistic services in the LLOD cloud in our research group, namely BabelNet, Babelfy and the Wikipedia Bitaxonomy. We describe them in terms of their salient aspects and objectives and discuss the benefits that each of these potentially brings to the world of LLOD NLP-aware services. We also presen...
متن کاملOLiA - Ontologies of Linguistic Annotation
This paper describes the Ontologies of Linguistic Annotation (OLiA) as one of the data sets currently available as part of Linguistic Linked Open Data (LLOD) cloud. Within the LLOD cloud, the OLiA ontologies serve as a reference hub for annotation terminology for linguistic phenomena on a great band-width of languages, they have been used to facilitate interoperability and information integrati...
متن کامل"LVF-lemon ― Towards a Linked Data Representation of ""Les Verbes français"""
In this study we elaborate a road map for the conversion of a traditional lexical syntactico-semantic resource for French into a linguistic linked open data (LLOD) model. Our approach uses current best-practices and the analyses of earlier similar undertakings (lemonUBY and PDEV-lemon) to tease out the most appropriate representation for our resource.
متن کاملTowards the Representation of Hashtags in Linguistic Linked Open Data Format
A pilot study is reported on developing the basic Linguistic Linked Open Data (LLOD) infrastructure for hashtags from social media posts. Our goal is the encoding of linguistically and semantically enriched hashtags in a formally compact way using the machinereadable OntoLex model. Initial hashtag processing consists of data-driven decomposition of multi-element hashtags, the linking of spellin...
متن کاملThe Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud o...
متن کاملMultilingual linked data
The interaction of natural language processing and the Semantic Web have lead to the creation of a new paradigm known as Linguistic Linked Open Data (LLOD), whereby traditional language resources are made available as linked data. Conversely, the publication of corpora, machine-readable dictionaries as linked data has opened new resources to Semantic Web researchers and allowed new tools to be ...
متن کامل